All the source codes of the aggregation methods are available here
.
To run the codes, you can clone the repository directly
or simply load the R script source file from the
repository using devtools
package in Rstudio
as follow:
Install devtools package using command:
install.packages("devtools")
Loading the source codes from GitHub
repository using source_url function by:
devtools::source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/MixCobraReg.R")
✎ Note: All codes contained in this
Rmarkdownare built with recent version of (version \(>\) 4.1, available here) and Rstudio (version >2022.02.2+485, available here). Note also that the code chucks are hidden by default.
To see the codes, you can:
Code button of the page,
then choose Show All Code to show all the codes,
orCode button at each section
to show the codes of that specific section.This Rmarkdown provides the implementation of an
aggregation method using input and out trade-off by Fischer
and Mougeot (2019). Let \(\mathcal{D}_n=\{(x_1,y_1),...,(x_n,y_n)\}\)
be a training data of size \(n\), where
the input-output couples \((x_i,y_i)\in\mathbb{R}^d\times\mathbb{R}\)
for all \(i=1,...,n\). \(\mathcal{D}_{n}\) is first randomly
partitioned into \(\mathcal{D}_{k}\)
and \(\mathcal{D}_{\ell}\) of size
\(k\) and \(\ell\) respectively such that \(k+\ell=n\). We construct \(M\) regression estimators (machines) \(r_1,...,r_M\) using only \(\mathcal{D}_{k}\). Let \({\bf
r}(x)=(r_1(x),...,r_M(x))^T\in\mathbb{R}^M\) be the vector of
predictions of \(x\in\mathbb{R}^d\),
the kernel-based consensual aggregation method evaluated at point \(x\) is defined by
\[\begin{equation} g_n(x)=\frac{\sum_{i=1}^{\ell}y_iK_{\alpha,\beta}(x-x_i,{\bf r}(x)-{\bf r}(x_i))}{\sum_{j=1}^{\ell}K_{\alpha,\beta}(x-x_j,{\bf r}(x)-{\bf r}(x_j))} \end{equation}\] where \(K:\mathbb{R}^{d+M}\to\mathbb{R}_+\) is a non-increasing kernel function with \(K_{\alpha,\beta}(u,v)=K(\frac{u}{\alpha},\frac{v}{\beta})\) for some smoothing parameter \(\alpha,\beta>0\) to be tuned, with the convention \(0/0=0\).
We prepare all the necessary tools for this Rmarkdown.
pacman package allows us to load (if exists) or install (if
does not exist) any available packages from The Comprehensive R Archive Network
(CRAN) of .
# Check if package "pacman" is already installed
lookup_packages <- installed.packages()[,1]
if(!("pacman" %in% lookup_packages)){
install.packages("pacman")
}
# To be installed or loaded
pacman::p_load(magrittr)
pacman::p_load(ggplot2)
pacman::p_load(tidyverse)
## package for "generateMachines"
pacman::p_load(tree)
pacman::p_load(glmnet)
pacman::p_load(randomForest)
pacman::p_load(FNN)
pacman::p_load(xgboost)
pacman::p_load(keras)
pacman::p_load(pracma)
pacman::p_load(latex2exp)
pacman::p_load(plotly)
pacman::p_load(dash)
rm(lookup_packages)
This section provides functions to generate basic machines (regressors) to be aggregated.
setBasicParameter_MixThis function allows us to set the values of some key parameters of the basic machines.
Argument:
lambda : the penalty parameter \(\lambda\) used in penalized linear models:
ridge or lasso.k : the parameter \(k\) of \(k\)NN (knn) regression model
and the default value is \(k=10\).ntree : the number of trees in random forest
(rf). By default, ntree = 300.mtry : the number of random features chosen in each
split of random forest procedure. By default, mtry = NULL
and the default value of mtry of randomForest
function from randomForest
library is used.eta_xgb : the learning rate \(\eta>0\) in gradient step of extreme
gradient boosting method (xgb) of xgboost
library.nrounds_xgb : the parameter nrounds
indicating the max number of boosting iterations. By default,
nrounds_xgb = 100.early_stop_xgb : the early stopping round criterion of
xgboost function. By, default,
early_stop_xgb = NULL and the early stopping function is
not triggered.max_depth_xgb : maximum depth of trees constructed in
xgboost.Value:
This function returns a list of all the parameters given in
its arguments, to be fed to the basicMachineParam argument
of function generateMachines_Mix defined in the next
section.
Remark.1:
lambda,k,ntreecan be a single value or a vector. In other words, each type of models can be constructed several times according to the values of the parameters \((\alpha, \beta)\) of the method.
setBasicParameter_Mix <- function(lambda = NULL,
k = 5,
ntree = 300,
mtry = NULL,
eta_xgb = 1,
nrounds_xgb = 100,
early_stop_xgb = NULL,
max_depth_xgb = 3){
return(list(
lambda = lambda,
k = k,
ntree = ntree,
mtry = mtry,
eta_xgb = eta_xgb,
nrounds_xgb = nrounds_xgb,
early_stop_xgb = early_stop_xgb,
max_depth_xgb = max_depth_xgb)
)
}
generateMachines_MixThis function generates all the basic machines to be aggregated.
Argument:
train_input : a matrix or data frame of the training
input data.train_response : a vector of training response variable
corresponding to the train_input.scale_input : logical value specifying whether to scale
the input data (to be between \(0\) and
\(1\)) or not. By default,
scale_input = TRUE.scale_machine : logical value specifying whether to
scale the predictions of the remaining part \(\mathcal{D}_{\ell}\) of the training data
(to be between \(0\) and \(1\)) or not. By default,
scale_machine = TRUE.machines : types of basic machines to be constructed.
It is a subset of {“lasso”, “ridge”, “knn”, “tree”, “rf”, “xgb”}. By
default, machines = NULL and all the six types of basic
machines are built.splits : real number between \(0\) and \(1\) specifying the proportion of training
data used to train the basic machines (\(\mathcal{D}_k\)). The remaining proportion
of (\(1-\) splits) is used
for the aggregation (\(\mathcal{D}_{\ell}\)). By default,
splits = 0.5.basicMachineParam : the option used to setup the values
of parameters of each machines. One should feed the function
setBasicParameter_Mix() defined above to this
argument.Value:
This function returns a list of the following objects.
fitted_remain : the predictions of the remaining part
(\(\mathcal{D}_{\ell}\)) of the
training data used for the aggregation.models : all the constructed basic machines (it
contains only the values of prapeter \(k\) for knn).id2 : a logical vector of size equals to the number of
lines of the training data indicating the location of the points used to
build the basic machines (FALSE) and the remaining ones
(TRUE).train_data : a list of the following objects:
train_input : training input data (scale or non-scaling
accordingly).predict_remain_org : predictions of the second part
\({\cal D}_{\ell}\) of the training
data without scaling.train_response : the trainging response variable.min_machine, max_machine : vectors of
minimum and maximum values predicted values of the remaining part \(\mathcal{D}_{\ell}\) of the training data.
They are NULL if scale_machine = FALSE.min_input, max_input : vectors of minimum
and maximum values of each variable of the training input. They are
NULL if scale_input = FALSE.✎ Note: You may need to modify the function accordingly if you want to build different types of basic machines.
generateMachines_Mix <- function(train_input,
train_response,
scale_input = TRUE,
scale_machine = TRUE,
machines = NULL,
splits = 0.5,
basicMachineParam = setBasicParameter_Mix()){
lambda = basicMachineParam$lambda
k <- basicMachineParam$k
ntree <- basicMachineParam$ntree
mtry <- basicMachineParam$mtry
eta_xgb <- basicMachineParam$eta_xgb
nrounds_xgb <- basicMachineParam$nrounds_xgb
early_stop_xgb <- basicMachineParam$early_stop_xgb
max_depth_xgb <- basicMachineParam$max_depth_xgb
# Packages
pacman::p_load(tree)
pacman::p_load(glmnet)
pacman::p_load(randomForest)
pacman::p_load(FNN)
pacman::p_load(xgboost)
# pacman::p_load(keras)
# Preparing data
input_names <- colnames(train_input)
input_size <- dim(train_input)
df_input <- train_input_scale <- train_input
maxs <- mins <- NULL
if(scale_input){
maxs <- map_dbl(.x = df_input, .f = max)
mins <- map_dbl(.x = df_input, .f = min)
train_input_scale <- scale(train_input, center = mins, scale = maxs - mins)
}
if(is.matrix(train_input_scale)){
df_input <- as_tibble(train_input_scale)
matrix_input <- train_input_scale
} else{
df_input <- train_input_scale
matrix_input <- as.matrix(train_input_scale)
}
# Machines
lasso_machine <- function(x, lambda0){
if(is.null(lambda)){
cv <- cv.glmnet(matrix_train_x1, train_y1, alpha = 1, lambda = 10^(seq(-3,2,length.out = 50)))
mod <- glmnet(matrix_train_x1, train_y1, alpha = 1, lambda = cv$lambda.min)
} else{
mod <- glmnet(matrix_train_x1, train_y1, alpha = 1, lambda = lambda0)
}
res <- predict.glmnet(mod, newx = x)
return(list(pred = res,
model = mod))
}
ridge_machine <- function(x, lambda0){
if(is.null(lambda)){
cv <- cv.glmnet(matrix_train_x1, train_y1, alpha = 0, lambda = 10^(seq(-3,2,length.out = 50)))
mod <- glmnet(matrix_train_x1, train_y1, alpha = 0, lambda = cv$lambda.min)
} else{
mod <- glmnet(matrix_train_x1, train_y1, alpha = 0, lambda = lambda0)
}
res <- predict.glmnet(mod, newx = x)
return(list(pred = res,
model = mod))
}
tree_machine <- function(x, pa = NULL) {
mod <- tree(as.formula(paste("train_y1~",
paste(input_names, sep = "", collapse = "+"),
collapse = "",
sep = "")),
data = df_train_x1)
res <- as.vector(predict(mod, x))
return(list(pred = res,
model = mod))
}
knn_machine <- function(x, k0) {
mod <- knn.reg(train = matrix_train_x1, test = x, y = train_y1, k = k0)
res = mod$pred
return(list(pred = res,
model = k0))
}
RF_machine <- function(x, ntree0) {
if(is.null(mtry)){
mod <- randomForest(x = df_train_x1, y = train_y1, ntree = ntree0)
}else{
mod <- randomForest(x = df_train_x1, y = train_y1, ntree = ntree0, mtry = mtry)
}
res <- as.vector(predict(mod, x))
return(list(pred = res,
model = mod))
}
xgb_machine = function(x, nrounds_xgb0){
mod <- xgboost(data = matrix_train_x1,
label = train_y1,
eta = eta_xgb,
nrounds = nrounds_xgb0,
objective = "reg:squarederror",
early_stopping_rounds = early_stop_xgb,
max_depth = max_depth_xgb,
verbose = 0)
res <- predict(mod, x)
return(list(pred = res,
model = mod))
}
# All machines
all_machines <- list(lasso = lasso_machine,
ridge = ridge_machine,
knn = knn_machine,
tree = tree_machine,
rf = RF_machine,
xgb = xgb_machine)
# All parameters
all_parameters <- list(lasso = lambda,
ridge = lambda,
knn = k,
tree = 1,
rf = ntree,
xgb = nrounds_xgb)
if(is.null(machines)){
mach <- c("lasso", "ridge", "knn", "tree", "rf", "xgb")
}else{
mach <- machines
}
# Extracting data
M <- length(mach)
size_D1 <- floor(splits*input_size[1])
id_D1 <- logical(input_size[1])
id_D1[sample(input_size[1], size_D1)] <- TRUE
df_train_x1 <- df_input[id_D1,]
matrix_train_x1 <- matrix_input[id_D1,]
train_y1 <- train_response[id_D1]
df_train_x2 <- df_input[!id_D1,]
matrix_train_x2 <- matrix_input[!id_D1,]
# Function to extract df and model from 'map' function
extr_df <- function(x, id){
return(tibble("r_{{id}}":= as.vector(pred_m[[x]]$pred)))
}
extr_mod <- function(x, id){
return(pred_m[[x]]$model)
}
pred_D2 <- c()
all_mod <- c()
cat("\n* Building basic machines ...\n")
cat("\t~ Progress:")
for(m in 1:M){
if(mach[m] %in% c("tree", "rf")){
x0_test <- df_train_x2
} else {
x0_test <- matrix_train_x2
}
if(is.null(all_parameters[[mach[m]]])){
para_ <- 1
}else{
para_ <- all_parameters[[mach[m]]]
}
pred_m <- map(para_,
.f = ~ all_machines[[mach[m]]](x0_test, .x))
tem0 <- imap_dfc(.x = 1:length(para_),
.f = extr_df)
tem1 <- imap(.x = 1:length(para_),
.f = extr_mod)
names(tem0) <- names(tem1) <- paste0(mach[m], 1:length(para_))
pred_D2 <- bind_cols(pred_D2, as_tibble(tem0))
all_mod[[mach[m]]] <- tem1
cat(" ... ", round(m/M, 2)*100L,"%", sep = "")
}
max_M <- min_M <- NULL
pred_D2_ <- pred_D2
if(scale_machine){
max_M <- map_dbl(.x = pred_D2, .f = max)
min_M <- map_dbl(.x = pred_D2, .f = min)
pred_D2 <- scale(pred_D2, center = min_M, scale = max_M - min_M)
}
return(list(fitted_remain = pred_D2,
models = all_mod,
id2 = !id_D1,
train_data = list(train_input = train_input_scale,
train_response = train_response,
predict_remain_org = pred_D2_,
min_machine = min_M,
max_machine = max_M,
min_input = mins,
max_input = maxs)))
}
Example.1: In this example, the method is implemented on
Bostondata ofMASSlibrary. The basic machines “rf”, “knn” and “xgb” are built on the first part of the training data (\(\mathcal{D}_{k}\)), and the Root Mean Square Errors (RMSE) evaluated on the second part of the training data (\(\mathcal{D}_{\ell}\)) used for aggregation) are reported.
pacman::p_load(MASS)
df <- Boston
basic_machines <- generateMachines_Mix(train_input = df[,1:13],
train_response = df[,14],
scale_input = TRUE,
machines = c("rf", "knn", "xgb"),
basicMachineParam = setBasicParameter_Mix(lambda = 1:10/10,
ntree = 10:20 * 25,
k = c(2:10)))
* Building basic machines ...
~ Progress: ... 33% ... 67% ... 100%
basic_machines$train_data$predict_remain_org %>%
sweep(1, df[basic_machines$id2, "medv"]) %>%
.^2 %>%
colMeans %>%
t %>%
sqrt %>%
as_tibble
This part provides functions to approximate the key parameters \((\alpha,\beta)\in(\mathbb{R}_+^*)^2\) of
the aggregation. Two important optimization methods are implemented:
gradient descent algorithm (grad) and
grid search (grid).
setGradParameter_MixThis function allows us to set the values of parameters needed to process the gradient descent algorithm to approximate the hyperparameter of the aggregation method.
Argument:
val_init : a 2D vector of initial value of gradient
descent iteration. By default, val_init = NULL and the
algorithm will select the best value (with smallest cost function) among
alpha_range and beta_range values of
parameters.rate : the 2D real valued vector or a string of
learning rate in gradent descent algorithm. By default,
rate = NULL (or “auto”) and the value
coef_auto = c(0.1, 0.1) will be used. It can also be a
functional rate, which is a string element of {“logarithm”, “sqrtroot”,
“linear”, “polynomial”, “exponential”}. Each rate is defined according
to coef_ type arguments bellow.alpha_range : a range vector of \(\alpha\) values to be considered as the
initial value in gradient step. By default,
alpha_range = seq(0.0001, 10, length.out = 5).beta_range : a range vector of \(\beta\) values to be considered as the
initial value in gradient step. By default,
beta_range = seq(0.1, 50, length.out = 5).max_iter : maximum itertaion of gradient descent
algorithm. By default, max_iter = 300.print_step : a logical value controlling whether to
print the result of each gradient step or not in order to keep track of
the algorithm. By default, print_step = TRUE.print_result : a logical value controlling whether to
print the result of the algorithm or not. By default,
print_result = TRUE.figure : a logical value controlling whether to plot a
graphic of the result or not. By default,
figure = TRUE.coef_auto : the constant learning rate when
rate = NULL. By default,
coef_auto = c(1, 1).coef_log : the coefficinet multiplying to the
logarithmic increment of the learning rate, i.e., the rate is
rate \(=\)
coef_log\(\times
\log(1+t)\) where \(t\) is the
numer number of iteration. By default, coef_log = 1.coef_sqrt : the coefficinet multiplying to the
square root increment of the learning rate, i.e., the rate is
rate \(=\)
coef_sqrt\(\times
\sqrt{t}\). By default, coef_sqrt = 1.coef_lm : the coefficinet multiplying to the
linear increment of the learning rate, i.e., the rate is
rate \(=\)
coef_lm\(\times t\). By
default, coef_lm = 1.deg_poly : the degree of the polynomial
increment of the learning rate, i.e., the rate is rate
\(=t^{\texttt{coef_poly}}\). By
default, deg_poly = 2.base_exp : the base of the exponential
increment of the learning rate, i.e., the rate is rate
\(=\) base_exp\(^t\). By default,
base_exp = 1.5.axes : names of \(x,y\) and \(z\)-axis respectively. By default,
axes = c("alpha", "beta", "L1 norm of gradient").title : the title of the plot. By default,
title = NULL and the default title is
Gradient step.threshold : the threshold to stop the algorithm what
the relative change is smaller than this value. By default,
threshold = 1e-10.Value:
This function returns a list of all the parameters given in its arguments.
setGradParameter_Mix <- function(val_init = NULL,
rate = NULL,
alpha_range = seq(0.0001, 10, length.out = 5),
beta_range = seq(0.1, 50, length.out = 5),
max_iter = 300,
print_step = TRUE,
print_result = TRUE,
figure = TRUE,
coef_auto = c(0.1,0.1),
coef_log = 1,
coef_sqrt = 1,
coef_lm = 1,
deg_poly = 2,
base_exp = 1.5,
axes = c("alpha", "beta", "L1 norm of gradient"),
title = NULL,
threshold = 1e-10) {
return(
list(val_init = val_init,
rate = rate,
alpha_range = alpha_range,
beta_range = beta_range,
max_iter = max_iter,
print_step = print_step,
print_result = print_result,
figure = figure,
coef_auto = coef_auto,
coef_log = coef_log,
coef_sqrt = coef_sqrt,
coef_lm = coef_lm,
deg_poly = deg_poly,
base_exp = base_exp,
axes = axes,
title = title,
threshold = threshold
)
)
}
gradOptimizer_MixThis function performs gradient descent algorithm to approximate the minimizer of any given functions (convex or locally convex around its optimizer).
Argument:
obj_fun : the objective function for which its
minimizer is to be estimated. It should take a 2D vector as an
input.setParameter : the control of gradient descent
parameters which should be the function
setGradParameter_Mix() defined earlier.Value:
This function returns a list of the following objects:
opt_param : the observed value of the minimizer.opt_error : the value of the optimal risk.all_grad : the vector of all the gradients collected
during the walk of the algorithm.all_param : the vector of all parameters collected
during the walk of the algorithm.run_time : the running time of the algorithm.gradOptimizer_Mix <- function(obj_fun,
setParameter = setGradParameter_Mix()) {
start.time <- Sys.time()
# Optimization step:
# ==================
spec_print <- function(x, dig = 5) return(ifelse(x > 1e-6,
format(x, digit = dig, nsmall = dig),
format(x, scientific = TRUE, digit = dig, nsmall = dig)))
collect_val <- c()
gradients <- c()
if (is.null(setParameter$val_init)){
range_alp <- rep(setParameter$alpha_range, length(setParameter$beta_range))
range_bet <- rep(setParameter$beta_range, length(setParameter$alpha_range))
tem <- map2_dbl(.x = range_alp,
.y = range_bet,
.f = ~ obj_fun(c(.x, .y)))
id0 <- which.min(tem)
val <- val0 <- c(range_alp[id0], range_bet[id0])
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3))
} else{
val <- val0 <- setParameter$val_init
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3))
}
if(setParameter$print_step){
cat("\n* Gradient descent algorithm ...")
cat("\n Step\t| alpha ; beta \t| Gradient (alpha ; beta)\t| Threshold \n")
cat(" ", rep("-", 80), sep = "")
cat("\n 0 \t| ", spec_print(val0[1])," ; ", spec_print(val0[2]),
"\t| ", spec_print(grad_[1], 6), " ; ", spec_print(grad_[2], 5),
" \t| ", setParameter$threshold, "\n")
cat(" ", rep("-",80), sep = "")
}
if (is.numeric(setParameter$rate)){
lambda0 <- setParameter$rate / abs(grad_)
rate_GD <- "auto"
} else{
r0 <- setParameter$coef_auto / abs(grad_)
# Rate functions
rate_func <- list(auto = r0,
logarithm = function(i) setParameter$coef_log * log(2 + i) * r0,
sqrtroot = function(i) setParameter$coef_sqrt * sqrt(i) * r0,
linear = function(i) setParameter$coef_lm * (i) * r0,
polynomial = function(i) i ^ setParameter$deg_poly * r0,
exponential = function(i) setParameter$base_exp ^ i * r0)
rate_GD <- match.arg(setParameter$rate,
c("auto",
"logarithm",
"sqrtroot",
"linear",
"polynomial",
"exponential"))
lambda0 <- rate_func[[rate_GD]]
}
i <- 0
grad0 <- 10*grad_
if (is.numeric(setParameter$rate) | rate_GD == "auto") {
while (i < setParameter$max_iter) {
if(any(is.na(grad_))){
val0 <- c(runif(1, val0[1]*0.99, val0[1]*1.01),
runif(1, val0[2]*0.99, val0[2]*1.01))
grad_ = pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
}
val <- val0 - lambda0 * grad_
if (any(val < 0)){
val[val < 0] <- val0[val < 0]/2
lambda0[val < 0] <- lambda0[val < 0] / 2
}
if(i > 5){
sign_ <- sign(grad_) != sign(grad0)
if(any(sign_)){
lambda0[sign_] = lambda0[sign_]/2
}
}
relative <- sum(abs(val - val0)) / sum(abs(val0))
test_threshold <- max(relative, sum(abs(grad_ - grad0)))
if (test_threshold > setParameter$threshold){
val0 <- val
grad0 <- grad_
} else{
break
}
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
i <- i + 1
if(setParameter$print_step){
cat("\n ", i, "\t| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t| ", spec_print(grad_[1], 5), " ; ", spec_print(grad_[2], 5),
"\t| ", test_threshold, "\r")
}
collect_val <- rbind(collect_val, val)
gradients <- rbind(gradients, grad_)
}
}
else{
while (i < setParameter$max_iter) {
if(any(is.na(grad_))){
val0 <- c(runif(1, val0[1]*0.99, val0[1]*1.01),
runif(1, val0[2]*0.99, val0[2]*1.01))
grad_ = pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
}
val <- val0 - lambda0(i) * grad_
if (any(val < 0)){
val[val < 0] <- val0[val < 0]/2
r0[val < 0] <- r0[val < 0] / 2
}
if(i > 5){
sign_ <- sign(grad_) != sign(grad0)
if(any(sign_)){
r0[sign_] <- r0[sign_] / 2
}
}
relative <- sum(abs(val - val0)) / sum(abs(val0))
test_threshold <- max(relative, sum(abs(grad_ - grad0)))
if (test_threshold > setParameter$threshold){
val0 <- val
grad0 <- grad_
}else{
break
}
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
if(setParameter$print_step){
cat("\n ", i, "\t| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t| ", spec_print(grad_[1], 5), " ; ", spec_print(grad_[2], 5),
"\t| ", test_threshold, "\r")
}
i <- i + 1
collect_val <- rbind(collect_val, val)
gradients <- rbind(gradients, grad_)
}
}
opt_ep <- val
opt_risk <- obj_fun(opt_ep)
if(setParameter$print_step){
cat(rep("-", 80), sep = "")
if(sum(abs(grad_)) == 0){
cat("\n Stopped| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t|\t ", 0,
"\t\t| ", test_threshold)
}else{
cat("\n Stopped| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t| ", spec_print(grad_[1]), " ; ", spec_print(grad_[2]),
"\t| ", test_threshold)
}
}
if(setParameter$print_result){
cat("\n ~ Observed parameter: (alpha, beta) = (", opt_ep[1], ", ", opt_ep[2], ") in",i, "itertaions.")
}
if (setParameter$figure) {
if(is.null(setParameter$title)){
tit <- paste("<b> L1 norm of gradient as a function of</b> (",
setParameter$axes[1],",",
setParameter$axes[2],
")")
} else{
tit <- setParameter$title
}
siz = length(collect_val[,1])
fig <- tibble(x = collect_val[,1],
y = collect_val[,2],
z = apply(abs(gradients), 1, sum)) %>%
plot_ly(x = ~x, y = ~y) %>%
add_trace(z = ~z,
type = "scatter3d",
mode = "lines",
line = list(width = 6,
color = ~z,
colorscale = 'Viridis'),
name = "Gradient step") %>%
add_trace(x = c(opt_ep[1], opt_ep[1]),
y = c(0, opt_ep[2]),
z = ~c(z[siz], z[siz]),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#5E88FC",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#5E88FC", "#38DE25")),
name = paste("Optimal",setParameter$axes[1])) %>%
add_trace(x = c(0, opt_ep[1]),
y = c(opt_ep[2], opt_ep[2]),
z = ~c(z[siz], z[siz]),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#F31536",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#F31536", "#38DE25")),
name = paste("Optimal",setParameter$axes[2])) %>%
add_trace(x = opt_ep[1],
y = opt_ep[2],
z = ~z[siz],
type = "scatter3d",
mode = 'markers',
marker = list(
size = 5,
color = "#38DE25"),
name = "Optimal point") %>%
layout(title = list(text = tit,
x = 0.075,
y = 0.925,
font = list(family = "Verdana",
color = "#5E88FC")),
legend = list(x = 100, y = 0.5),
scene = list(
xaxis = list(title = setParameter$axes[1]),
yaxis = list(title = setParameter$axes[2]),
zaxis = list( title = setParameter$axes[3])))
fig %>% print
}
end.time = Sys.time()
return(list(
opt_param = opt_ep,
opt_error = opt_risk,
all_grad = gradients,
all_param = collect_val,
run_time = difftime(end.time,
start.time,
units = "secs")[[1]]
))
}
Example.2: Approximate \[(x^*,y^*)=\text{arg}\min_{x,y)\in\mathbb{R}^2}f(x,y),\] where \[f(x,y)=(x-1)^2(1+\sin^2(2.5(x-1)))+(y-1)^2(1+\sin^2(2.5(y-1)))\] Note that argument
val_initis crucial since \(f\) is not convex.
object_func <- function(x) sum((x-1)^2*(1+sin(2.5*(x-1))^2))
p <- tibble::tibble(x = rep(seq(-4,6, length.out = 30),30),
y = rep(seq(-3,7, length.out = 30), each = 30)) %>%
mutate(z = map2_dbl(.x = x, .y = y, .f = ~ object_func(c(.x,.y)))) %>%
plot_ly(x = ~x, y = ~y, z = ~z, type = "mesh3d") %>%
add_trace(x = 1,
y = 1,
z = 0,
type = "scatter3d", mode = "markers",
name = "Optimal point") %>%
layout(title = "Cost function")
show(p)
gd <- gradOptimizer_Mix(obj_fun = object_func,
setParameter = setGradParameter_Mix(val_init = c(2.4, 3.5),
rate = "log",
coef_auto = c(0.7, 0.7),
print_step = TRUE,
figure = TRUE,
axes = c("x", "y")))
* Gradient descent algorithm ...
Step | alpha ; beta | Gradient (alpha ; beta) | Threshold
--------------------------------------------------------------------------------
0 | 2.40000 ; 3.50000 | 6.363771 ; 3.96922 | 1e-10
--------------------------------------------------------------------------------
0 | 1.9148 ; 3.0148 | 0.79847 ; 1.51397 | 92.99696
1 | 1.8183 ; 2.7215 | 1.56931 ; 11.74586 | 8.020555
2 | 1.5790 ; 1.3607 | 2.50307 ; 1.48199 | 11.00273
3 | 1.1359 ; 0.9401 | 0.33091 ; -1.2513e-01 | 11.19763
4 | 1.0707 ; 0.9796 | 0.14999 ; -4.0947e-02 | 3.779282
5 | 1.0385 ; 0.9937 | 0.078525 ; -1.2638e-02 | 0.2651093
6 | 1.0206 ; 0.9983 | 0.041395 ; -3.3626e-03 | 0.09977258
7 | 1.0106 ; 0.9996 | 0.021197 ; -7.565e-04 | 0.04640585
8 | 1.0052 ; 0.9999 | 0.010433 ; -1.421e-04 | 0.02280361
9 | 1.0025 ; 1.0000 | 0.0049263 ; -2.1917e-05 | 0.01137799
10 | 1.0011 ; 1.0000 | 0.0022329 ; -2.7076e-06 | 0.005627254
11 | 1.0005 ; 1.0000 | 0.00097291 ; -2.5805e-07 | 0.002712624
12 | 1.0002 ; 1.0000 | 0.00040805 ; -1.7849e-08 | 0.001262474
13 | 1.0001 ; 1.0000 | 0.00016495 ; -8.0023e-10 | 0.0005650944
14 | 1.0000 ; 1.0000 | 6.4339e-05 ; -1.7661e-11 | 0.000243119
15 | 1.0000 ; 1.0000 | 2.4237e-05 ; -1.2212e-14 | 0.0001006146
16 | 1.0000 ; 1.0000 | 8.8254e-06 ; 3.3307e-16 | 4.010183e-05
17 | 1.0000 ; 1.0000 | 3.1086e-06 ; -1.1102e-16 | 1.541137e-05
18 | 1.0000 ; 1.0000 | 1.0599e-06 ; -1.1102e-16 | 5.716742e-06
19 | 1.0000 ; 1.0000 | 3.5e-07 ; -1.1102e-16 | 2.048728e-06
20 | 1.0000 ; 1.0000 | 1.1199e-07 ; -1.1102e-16 | 7.09896e-07
21 | 1.0000 ; 1.0000 | 3.4741e-08 ; -1.1102e-16 | 2.380033e-07
22 | 1.0000 ; 1.0000 | 1.0452e-08 ; -1.1102e-16 | 7.72527e-08
23 | 1.0000 ; 1.0000 | 3.0504e-09 ; -1.1102e-16 | 2.428952e-08
24 | 1.0000 ; 1.0000 | 8.6399e-10 ; -1.1102e-16 | 7.401194e-09
25 | 1.0000 ; 1.0000 | 2.3754e-10 ; -1.1102e-16 | 2.18645e-09
26 | 1.0000 ; 1.0000 | 6.3406e-11 ; -1.1102e-16 | 6.264504e-10
27 | 1.0000 ; 1.0000 | 1.6436e-11 ; -1.1102e-16 | 1.741309e-10
--------------------------------------------------------------------------------
Stopped| 1.0000 ; 1.0000 | 1.6436e-11 ; -1.1102e-16 | 4.697043e-11
~ Observed parameter: (alpha, beta) = ( 1 , 1 ) in 28 itertaions.
setGridParameter_MixArgument:
min_alpha : mininum value of \(\alpha\) in the grid. By defualt,
min_alpha = 1e-5.max_alpha : maxinum value of \(\alpha\) in the grid. By defualt,
max_alpha = 5.min_beta : mininum value of \(\beta\) in the grid. By defualt,
min_beta = 0.1.max_beta : maximum value of \(\beta\) in the grid. By defualt,
max_alpha = 50.n_alpha, n_beta = 30 : the number of \(\alpha\) and \(\beta\) respectively in the grid. By
defualt, n_alpha = n_beta = 30.parameters : the list of parameter \(alpha\) and \(\beta\) in case non-uniform grid is
considered. It should be a list of two vectors containing the values of
\(\alpha\) and \(\beta\) respectively. By default,
parameters = NULL and the default uniform grid is
used.axes : names of \(x,y\) and \(z\)-axis respectively. By default,
axes = c("alpha", "beta", "Risk").title : the title of the plot. By default,
title = NULL and the default title is
Cross-validation risk as a function of \((\alpha, \beta)\).print_result : a logical value specifying whether to
print the observed result or not.figure : a logical value specifying whether to plot the
graphic of cross-validation error or not.Value:
This function returns a list of all the parameters given in its arguments.
setGridParameter_Mix <- function(min_alpha = 1e-5,
max_alpha = 5,
min_beta = 0.1,
max_beta = 50,
n_alpha = 30,
n_beta = 30,
parameters = NULL,
axes = c("alpha", "beta", "Risk"),
title = NULL,
print_result = TRUE,
figure = TRUE){
return(list(min_alpha = min_alpha,
max_alpha = max_alpha,
min_beta = min_beta,
max_beta = max_beta,
n_alpha = n_alpha,
n_beta = n_beta,
axes = axes,
title = title,
parameters = parameters,
print_result = print_result,
figure = figure))
}
gridOptimizer_MixArgument:
obj_fun : the objective function for which its
minimizer is to be estimated. It should be a univarate function of real
positive variables.setParameter : the control of grid search algorithm
parameters which should be the function
setGridParameter_Mix() defined above.Value:
This function returns a list of the following objects:
opt_param : the observed value of the minimizer.opt_error : the value of optimal risk.all_risk : the vector of all the errors evaluated at
all the values of considered parameters.run.time : the running time of the algorithm.gridOptimizer_Mix <- function(obj_func,
setParameter = setGridParameter_Mix()){
t0 <- Sys.time()
if(is.null(setParameter$parameters)){
param_list <- list(alpha = rep(seq(setParameter$min_alpha,
setParameter$max_alpha,
length.out = setParameter$n_alpha),
setParameter$n_beta),
beta = rep(seq(setParameter$min_beta,
setParameter$max_beta,
length.out = setParameter$n_beta),
each = setParameter$n_alpha))
} else{
param_list <- list(alpha = rep(setParameter$parameters[[1]],
length(setParameter$parameters[[2]])),
beta = rep(setParameter$parameters[[2]],
each = length(setParameter$parameters[[1]])))
}
risk <- map2_dbl(.x = param_list$alpha,
.y = param_list$beta,
.f = ~ obj_func(c(.x, .y)))
id_opt <- which.min(risk)
opt_ep <- c(param_list$alpha[id_opt], param_list$beta[id_opt])
opt_risk <- risk[id_opt]
if(setParameter$print_result){
cat("\n* Grid search algorithm...", "\n ~ Observed parameter: (alpha, beta) = (", opt_ep[1],
", ",
opt_ep[2], ")",
sep = "")
}
if(setParameter$figure){
if(is.null(setParameter$title)){
tit <- paste("<b> Cross-validation risk as a function of</b> (",
setParameter$axes[1],",",
setParameter$axes[2],
")")
} else{
tit <- setParameter$title
}
fig <- tibble(alpha = param_list$alpha,
beta = param_list$beta,
risk = risk) %>%
plot_ly(x = ~alpha, y = ~beta, z = ~risk, type = "mesh3d") %>%
add_trace(x = c(opt_ep[1], opt_ep[1]),
y = c(0, opt_ep[2]),
z = c(opt_risk, opt_risk),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#5E88FC",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#5E88FC", "#38DE25")),
name = paste("Optimal",setParameter$axes[1])) %>%
add_trace(x = c(0, opt_ep[1]),
y = c(opt_ep[2], opt_ep[2]),
z = c(opt_risk, opt_risk),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#F31536",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#F31536", "#38DE25")),
name = paste("Optimal",setParameter$axes[2])) %>%
add_trace(x = opt_ep[1],
y = opt_ep[2],
z = opt_risk,
type = "scatter3d",
mode = 'markers',
marker = list(
size = 5,
color = "#38DE25"),
name = "Optimal point") %>%
layout(title = list(text = tit,
x = 0.075,
y = 0.925,
font = list(family = "Verdana",
color = "#5E88FC")),
legend = list(x = 100, y = 0.5),
scene = list(xaxis = list(title = setParameter$axes[1]),
yaxis = list(title = setParameter$axes[2]),
zaxis = list( title = setParameter$axes[3])))
print(fig)
}
t1 <- Sys.time()
return(list(opt_param = opt_ep,
opt_error = opt_risk,
all_risk = risk,
run.time = difftime(t1,
t0,
units = "secs")[[1]])
)
}
Example.2: Again with grid search.
grid <- gridOptimizer_Mix(obj_fun = object_func,
setParameter = setGridParameter_Mix(min_alpha = -2,
max_alpha = 4,
min_beta = -2,
max_beta = 4,
n_alpha = 150,
n_beta = 150,
axes = c("x", "y", "z"),
title = "z = f(x,y)",
figure = TRUE))
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1.020134, 1.020134)
Constructing aggregation method is equivalent to approximating the optimal value of parameter \((\alpha,\beta)\in(\mathbb{R}_+^*)^2\) introduced in section 1.1 by minimizing some lost function. In this study, we propose \(\kappa\)-fold cross validation lost function defined by
\[\begin{equation} \label{eq:kappa} \varphi^{\kappa}(h)=\frac{1}{\kappa}\sum_{k=1}^{\kappa}\sum_{(x_j,y_j)\in F_k}(g_n(x_j)-y_j)^2 \end{equation}\] where
dist_matrix_MixThis function computes different distances between data points of each training folds (\(\mathcal{D}_{\ell}-F_k\)) and the corresponding validation fold \(F_k\) for any \(k=1,\dots,\kappa\). The \(\kappa\) distance matrices \(D_k=(d[{\bf r}(x_i),{\bf r}(x_j)])_{i,j}\) for \(k=1,\dots,\kappa\), are computed, where the distance \(d\) is defined according to different types kernel functions.
Argument:
basicMachines : the basic machine object, which is an
output of generateMachines_Mix function.n_cv : the number \(\kappa\) of cross-validation folds. By
default, n_cv = 5.kernel : the kernel function used for the aggregation,
which is an element of {“gaussian”, “epanechnikov”, “biweight”,
“triweight”, “triangular”, “naive”}. By default,
kernel = "gaussian".Value:
This functions returns a list of the following objects:
dist : a list of data frame (tibble)
containing sublists corresponding to kernel functions used for the
aggregation. Each sublist contains n_cv numbers of distance
matrices \(D_k=(d[{\bf r}(x_i),{\bf
r}(x_j)])_{i,j}\), for \(k=1,\dots,\kappa\), containing distances
between the data points in valiation fold (along the columns) and the
\(1-\kappa\) remaining folds of
training data (along the rows). The type of distance matrices depends on
the kernel used:
kernel = naive, the distance matrices contain the
maximum distance between data points, i.e., \[D_k=(\|{\bf r}(x_i)-{\bf
r}(x_j)\|_{\max})_{i,j}\text{ for }k=1,\dots,\kappa.\]kernel = triangular, the distance matrices contain
the \(L_1\) distance between data
points, i.e., dist_matrix_Mix \[D_k=(\|{\bf r}(x_i)-{\bf r}(x_j)\|_1)_{i,j}\text{
for }k=1,\dots,\kappa.\]id_shuffle : the shuffled indices in
cross-validation.n_cv : the number \(\kappa\) of cross-validation folds.dist_matrix_Mix <- function(basicMachines,
n_cv = 5,
kernel = "gausian",
id_shuffle = NULL){
n <- nrow(basicMachines$fitted_remain)
n_each_fold <- floor(n/n_cv)
# shuffled indices
if(is.null(id_shuffle)){
shuffle <- 1:(n_cv-1) %>%
rep(n_each_fold) %>%
c(., rep(n_cv, n - n_each_fold * (n_cv - 1))) %>%
sample
}else{
shuffle <- id_shuffle
}
# the prediction matrix D_l
df_mach <- as.matrix(basicMachines$fitted_remain)
df_input <- as.matrix(basicMachines$train_data$train_input[basicMachines$id2,])
if(! (kernel %in% c("naive", "triangular"))){
pair_dist <- function(M, N){
n_N <- dim(N)
n_M <- dim(M)
res_ <- 1:nrow(N) %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums((M - matrix(rep(N[id,], n_M[1]), ncol = n_M[2], byrow = TRUE))^2)))))
return(res_)
}
}
if(kernel == "triangular"){
pair_dist <- function(M, N){
n_N <- dim(N)
n_M <- dim(M)
res_ <- 1:nrow(N) %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums(abs(M - matrix(rep(N[id,], n_M[1]), ncol = n_M[2], byrow = TRUE)))))))
return(res_)
}
}
if(kernel == "naive"){
pair_dist <- function(M, N){
n_N <- dim(N)
n_M <- dim(M)
res_ <- 1:nrow(N) %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(apply(abs(M - matrix(rep(N[id,], n_M[1]), ncol = n_M[2], byrow = TRUE)), 1, max)))))
return(res_)
}
}
L1 <- 1:n_cv %>%
map(.f = (\(x) pair_dist(df_input[shuffle != x,],
df_input[shuffle == x,])))
L2 <- 1:n_cv %>%
map(.f = (\(x) pair_dist(df_mach[shuffle != x,],
df_mach[shuffle == x,])))
return(list(dist_input = L1,
dist_machine = L2,
id_shuffle = shuffle,
n_cv = n_cv))
}
Example.3: The method
dist_matrix_Mixis implemented on the obtained basic machines built in Example.1 with the corresponding Gaussian kernel function.
dis <- dist_matrix_Mix(basicMachines = basic_machines,
n_cv = 3,
kernel = "gaussian")
dis$n_cv
[1] 3
Example.4: From the distance matrix, we can compute the error corresponding to Gaussian kernel function, then use both of the optimization methods to approximate the smoothing paramter in this case.
# Gaussian kernel
gaussian_kern <- function(.ep = c(.05, 0.005),
.dist_matrix,
.train_response2,
.inv_sigma = sqrt(.5),
.alpha = 2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(exp(- (x[1]*D1+x[2]*D2)^(.alpha/2)*.inv_sigma^.alpha))
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Kappa cross-validation error
cost_fun <- function(x,
.dist_matrix = dis,
.kernel_func = gaussian_kern,
.train_response2 = basic_machines$train_data$train_response[basic_machines$id2],
.inv_sigma = sqrt(.5),
.alpha = 2){
return(.kernel_func(.ep = x,
.dist_matrix = .dist_matrix,
.train_response2 = .train_response2,
.inv_sigma = .inv_sigma,
.alpha = .alpha))
}
# Optimization
opt_param_gd <- gradOptimizer_Mix(obj_fun = cost_fun,
setParameter = setGradParameter_Mix(rate = "linear",
print_step = TRUE,
print_result = TRUE,
figure = TRUE))
* Gradient descent algorithm ...
Step | alpha ; beta | Gradient (alpha ; beta) | Threshold
--------------------------------------------------------------------------------
0 | 5.00005 ; 25.05000 | -9.41942e+01 ; 10.93488 | 1e-10
--------------------------------------------------------------------------------
0 | 5.0000 ; 25.0500 | -9.4194e+01 ; 10.93488 | 946.1615
1 | 5.1000 ; 24.9500 | -9.131e+01 ; 11.31877 | 0.006655563
2 | 5.2939 ; 24.7430 | -8.6439e+01 ; 12.06192 | 3.268387
3 | 5.5692 ; 24.4121 | -8.1095e+01 ; 13.13964 | 5.614224
4 | 5.9136 ; 23.9314 | -7.696e+01 ; 14.60412 | 6.421059
5 | 6.3221 ; 23.2636 | -7.588e+01 ; 16.67578 | 5.599537
6 | 6.8055 ; 22.3486 | -8.0208e+01 ; 19.79002 | 3.151806
7 | 7.4015 ; 21.0818 | -9.209e+01 ; 24.26427 | 7.44213
8 | 8.1837 ; 19.3066 | -1.0471e+02 ; 27.68279 | 16.35665
9 | 9.1841 ; 17.0281 | -8.3609e+01 ; 19.78119 | 16.0361
10 | 10.0717 ; 15.2191 | -4.3463e+01 ; 5.63261 | 29.00008
11 | 10.5793 ; 14.6525 | -2.7187e+01 ; 0.059944 | 54.2954
12 | 10.9257 ; 14.6459 | -2.0188e+01 ; -1.5753e+00 | 21.8482
13 | 11.2043 ; 14.8332 | -1.6161e+01 ; -1.9218e+00 | 8.633915
14 | 11.4445 ; 14.9562 | -1.2993e+01 ; -2.2793e+00 | 4.37372
15 | 11.6514 ; 15.1126 | -1.0721e+01 ; -2.3503e+00 | 3.525631
16 | 11.8335 ; 15.2845 | -9.0021e+00 ; -2.2716e+00 | 2.343102
17 | 11.9960 ; 15.4611 | -7.6493e+00 ; -2.1181e+00 | 1.79762
18 | 12.1421 ; 15.6354 | -6.553e+00 ; -1.9316e+00 | 1.506288
19 | 12.2743 ; 15.8033 | -5.6449e+00 ; -1.7357e+00 | 1.282802
20 | 12.3942 ; 15.9620 | -4.8802e+00 ; -1.5433e+00 | 1.104064
21 | 12.5030 ; 16.1102 | -4.2286e+00 ; -1.3616e+00 | 0.9569584
22 | 12.6017 ; 16.2472 | -3.6681e+00 ; -1.1938e+00 | 0.8334344
23 | 12.6913 ; 16.3727 | -3.1829e+00 ; -1.0414e+00 | 0.7282068
24 | 12.7724 ; 16.4870 | -2.7608e+00 ; -9.045e-01 | 0.6376106
25 | 12.8457 ; 16.5904 | -2.3925e+00 ; -7.8252e-01 | 0.5589861
26 | 12.9117 ; 16.6834 | -2.0703e+00 ; -6.7455e-01 | 0.4903353
27 | 12.9711 ; 16.7667 | -1.7883e+00 ; -5.7949e-01 | 0.4301108
28 | 13.0242 ; 16.8409 | -1.5413e+00 ; -4.9619e-01 | 0.3770935
29 | 13.0717 ; 16.9067 | -1.3251e+00 ; -4.2349e-01 | 0.3302988
30 | 13.1139 ; 16.9648 | -1.136e+00 ; -3.6027e-01 | 0.288921
31 | 13.1513 ; 17.0159 | -9.7093e-01 ; -3.055e-01 | 0.2522949
32 | 13.1843 ; 17.0606 | -8.2709e-01 ; -2.582e-01 | 0.2198576
33 | 13.2132 ; 17.0995 | -7.0207e-01 ; -2.175e-01 | 0.1911369
34 | 13.2386 ; 17.1333 | -5.9373e-01 ; -1.8258e-01 | 0.1657219
35 | 13.2606 ; 17.1625 | -5.0014e-01 ; -1.5273e-01 | 0.1432608
36 | 13.2797 ; 17.1877 | -4.1958e-01 ; -1.2729e-01 | 0.1234433
37 | 13.2962 ; 17.2092 | -3.5049e-01 ; -1.057e-01 | 0.1059967
38 | 13.3104 ; 17.2276 | -2.915e-01 ; -8.743e-02 | 0.09067661
39 | 13.3224 ; 17.2432 | -2.4134e-01 ; -7.203e-02 | 0.07726385
40 | 13.3327 ; 17.2564 | -1.9888e-01 ; -5.9099e-02 | 0.06556184
41 | 13.3413 ; 17.2674 | -1.6311e-01 ; -4.8284e-02 | 0.05539015
42 | 13.3486 ; 17.2767 | -1.3312e-01 ; -3.9276e-02 | 0.04658347
43 | 13.3547 ; 17.2844 | -1.0811e-01 ; -3.1805e-02 | 0.03899416
44 | 13.3597 ; 17.2908 | -8.736e-02 ; -2.5636e-02 | 0.03248192
45 | 13.3639 ; 17.2961 | -7.0229e-02 ; -2.0566e-02 | 0.02692221
46 | 13.3673 ; 17.3004 | -5.6165e-02 ; -1.6419e-02 | 0.02220052
47 | 13.3701 ; 17.3040 | -4.4682e-02 ; -1.3043e-02 | 0.01821102
48 | 13.3724 ; 17.3068 | -3.5358e-02 ; -1.0309e-02 | 0.01485913
49 | 13.3743 ; 17.3091 | -2.783e-02 ; -8.1066e-03 | 0.01205797
50 | 13.3757 ; 17.3110 | -2.1786e-02 ; -6.3411e-03 | 0.009730487
51 | 13.3769 ; 17.3125 | -1.6962e-02 ; -4.9339e-03 | 0.00780939
52 | 13.3779 ; 17.3136 | -1.3133e-02 ; -3.8184e-03 | 0.006231336
53 | 13.3786 ; 17.3146 | -1.0112e-02 ; -2.9387e-03 | 0.004944283
54 | 13.3792 ; 17.3153 | -7.7429e-03 ; -2.2498e-03 | 0.003900583
55 | 13.3796 ; 17.3159 | -5.8952e-03 ; -1.7123e-03 | 0.003058406
56 | 13.3800 ; 17.3163 | -4.4629e-03 ; -1.2958e-03 | 0.002385123
57 | 13.3802 ; 17.3166 | -3.3596e-03 ; -9.7547e-04 | 0.001848779
58 | 13.3805 ; 17.3169 | -2.5142e-03 ; -7.3013e-04 | 0.001423692
59 | 13.3806 ; 17.3171 | -1.8708e-03 ; -5.431e-04 | 0.001090749
60 | 13.3807 ; 17.3172 | -1.3842e-03 ; -4.0173e-04 | 0.0008304242
61 | 13.3808 ; 17.3173 | -1.0177e-03 ; -2.9554e-04 | 0.0006280374
62 | 13.3809 ; 17.3174 | -7.4414e-04 ; -2.159e-04 | 0.0004726614
63 | 13.3809 ; 17.3175 | -5.4074e-04 ; -1.5718e-04 | 0.0003531819
64 | 13.3810 ; 17.3175 | -3.9077e-04 ; -1.1328e-04 | 0.0002621266
65 | 13.3810 ; 17.3176 | -2.8045e-04 ; -8.1368e-05 | 0.0001938633
66 | 13.3810 ; 17.3176 | -2.0006e-04 ; -5.805e-05 | 0.000142234
67 | 13.3810 ; 17.3176 | -1.4216e-04 ; -4.1454e-05 | 0.0001037092
68 | 13.3810 ; 17.3176 | -1.0033e-04 ; -2.895e-05 | 7.449637e-05
69 | 13.3811 ; 17.3176 | -7.0178e-05 ; -2.0389e-05 | 5.433279e-05
70 | 13.3811 ; 17.3176 | -4.8963e-05 ; -1.4156e-05 | 3.871258e-05
71 | 13.3811 ; 17.3176 | -3.3906e-05 ; -9.8002e-06 | 2.744801e-05
72 | 13.3811 ; 17.3176 | -2.3017e-05 ; -6.9465e-06 | 1.941261e-05
73 | 13.3811 ; 17.3177 | -1.6108e-05 ; -4.3932e-06 | 1.374278e-05
74 | 13.3811 ; 17.3177 | -1.0776e-05 ; -2.9663e-06 | 9.46224e-06
75 | 13.3811 ; 17.3177 | -7.1342e-06 ; -2.1778e-06 | 6.758743e-06
76 | 13.3811 ; 17.3177 | -4.8438e-06 ; -1.3517e-06 | 4.430732e-06
77 | 13.3811 ; 17.3177 | -3.1541e-06 ; -9.7626e-07 | 3.116532e-06
78 | 13.3811 ; 17.3177 | -2.1778e-06 ; -8.6362e-07 | 2.065172e-06
79 | 13.3811 ; 17.3177 | -1.577e-06 ; -3.7549e-07 | 1.088909e-06
80 | 13.3811 ; 17.3177 | -7.8852e-07 ; -1.8774e-07 | 1.088909e-06
81 | 13.3811 ; 17.3177 | -6.0078e-07 ; -1.8774e-07 | 9.762629e-07
82 | 13.3811 ; 17.3177 | -4.8813e-07 ; -2.6284e-07 | 1.877429e-07
83 | 13.3811 ; 17.3177 | -3.7549e-07 ; -1.8774e-07 | 1.877429e-07
84 | 13.3811 ; 17.3177 | -2.6284e-07 ; 0e+00 | 1.877429e-07
85 | 13.3811 ; 17.3177 | -7.5097e-08 ; -1.1265e-07 | 3.003886e-07
86 | 13.3811 ; 17.3177 | 0e+00 ; 7.5097e-08 | 3.003886e-07
87 | 13.3811 ; 17.3177 | -1.1265e-07 ; 0e+00 | 2.6284e-07
88 | 13.3811 ; 17.3177 | 7.5097e-08 ; 7.5097e-08 | 1.877429e-07
89 | 13.3811 ; 17.3177 | -3.7549e-08 ; 1.8774e-07 | 2.6284e-07
90 | 13.3811 ; 17.3177 | 0e+00 ; 7.5097e-08 | 2.252914e-07
91 | 13.3811 ; 17.3177 | 1.1265e-07 ; 1.1265e-07 | 1.501943e-07
92 | 13.3811 ; 17.3177 | -3.7549e-08 ; 7.5097e-08 | 1.501943e-07
93 | 13.3811 ; 17.3177 | 1.8774e-07 ; 1.5019e-07 | 1.877429e-07
94 | 13.3811 ; 17.3177 | 7.5097e-08 ; 2.6284e-07 | 3.003886e-07
95 | 13.3811 ; 17.3177 | -1.1265e-07 ; 0e+00 | 2.252914e-07
96 | 13.3811 ; 17.3177 | -1.1265e-07 ; 1.5019e-07 | 4.505829e-07
97 | 13.3811 ; 17.3177 | 7.5097e-08 ; 1.5019e-07 | 1.501943e-07
98 | 13.3811 ; 17.3177 | 1.5019e-07 ; 1.5019e-07 | 1.877429e-07
99 | 13.3811 ; 17.3177 | -7.5097e-08 ; 0e+00 | 7.509715e-08
100 | 13.3811 ; 17.3177 | 3.7549e-08 ; -3.7549e-08 | 3.754857e-07
101 | 13.3811 ; 17.3177 | -3.7549e-08 ; 1.5019e-07 | 1.501943e-07
102 | 13.3811 ; 17.3177 | -3.7549e-08 ; 1.5019e-07 | 2.6284e-07
--------------------------------------------------------------------------------
Stopped| 13.3811 ; 17.3177 | -3.7549e-08 ; 1.5019e-07 | 2.26654e-12
~ Observed parameter: (alpha, beta) = ( 13.38107 , 17.31766 ) in 103 itertaions.
# Optimization
opt_param_grid <- gridOptimizer_Mix(obj_fun = cost_fun,
setParameter = setGridParameter_Mix(min_alpha = 0.0001,
max_alpha = 20,
min_beta = 0.001,
max_beta = 100,
n_beta = 30,
n_alpha = 30,
figure = TRUE))
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (13.10348, 17.24221)
cat('* Observed parameter:\n\t - Gradient descent\t: (alpha, beta) = (',
opt_param_gd$opt_param[1],",",opt_param_gd$opt_param[2], ")",
'\t with error:', cost_fun(opt_param_gd$opt_param),
'\n\t - Grid search\t\t: (alpha, beta) = (',opt_param_grid$opt_param[1],",",
opt_param_grid$opt_param[2],
') \t with error:', cost_fun(opt_param_grid$opt_param))
* Observed parameter:
- Gradient descent : (alpha, beta) = ( 13.38107 , 17.31766 ) with error: 3283.809
- Grid search : (alpha, beta) = ( 13.10348 , 17.24221 ) with error: 3284.017
This function gathers the constructed machines and perform an optimization algorithm to approximate the smoothing parameter for the aggregation, using only the remaining part \({\cal D}_{\ell}\) of the training data.
Argument:
train_input, : a matrix or data frame of the training
input data.train_response : a vector of the corresponding response
variable of the train_input.machines : a vector of basic machines to be
constructed. It must be a subset of {“lasso”, “ridge”, “knn”, “tree”,
“rf”, “xgb”}. By default, machines = NULL and all the six
types of basic machines are built.scale_input : a logical value specifying whether or not
to scale the input data before building the basic regression predictors.
By default, scale_input = TRUE.scale_machine : a logical value specifying whether or
not to scale the predicted features given by all the basic regression
predictors, for aggregation. By default,
scale_machine = TRUE.splits : the proportion of training data (the
proportion of \({\cal D}_k\) in \({\cal D}_n\)) used to build the basic
machines. By default, splits = .5.n_cv : the number of cross-validation folds used to
tune the smoothing parameter.inv_sigma, alpha : the inverse normalized constant
\(\sigma^{-1}>0\) and the exponent
\(\alpha>0\) of exponential kernel:
\(K(x)=e^{-\|x/\sigma\|^{\alpha}}\) for
any \(x\in\mathbb{R}^d\). By default,
inv_sigma =\(\sqrt{1/2}\)
and alpha = 2 which corresponds to the Gaussian
kernel.kernels : the kernel function or vector of kernel
functions used for the aggregation. By fault,
kernels = "gaussian".optimizeMethod : the optimization methods used to learn
the smoothing parameter. By default,
optimizeMethod = "grad" which stands for gradient descent
algorithm, and if optimizeMethod = "grid", then the grid
search algorithm is used. Note that
this should be of the same size as the kernels
argument, otherwise “grid” method will
be used for all the kernels.setBasicMachineParam : an option used to set the values
of the parameters of the basic machines.
setBasicParameter_Mix function should be fed to this
argument.setGradParam : an option used to set the values of the
parameters of the gradient descent algorithm.
setGradParameter_Mix function should be fed to it.setGridParam : an option used to set the values of the
parameters of the grid search algorithm.
setGridParameter_Mix function should be fed to it.Value:
This function returns a list of the following objects:
opt_parameters : the observed optimal parameters.add_parameters : other aditional parameters such as
scaling options, parameters of kernel functions and the optimization
methods used.basic_machines : the list of basic machine object.fit_parameter_Mix <- function(train_input,
train_response,
train_predictions = NULL,
machines = NULL,
scale_input = TRUE,
scale_machine = TRUE,
splits = 0.5,
n_cv = 5,
inv_sigma = sqrt(.5),
alp = 2,
kernels = "gaussian",
optimizeMethod = "grad",
setBasicMachineParam = setBasicParameter_Mix(),
setGradParam = setGradParameter_Mix(),
setGridParam = setGridParameter_Mix()){
kernels_lookup <- c("gaussian", "epanechnikov", "biweight", "triweight", "triangular", "naive")
kernel_real <- kernels %>%
sapply(FUN = function(x) return(match.arg(x, kernels_lookup)))
if(is.null(train_predictions)){
mach2 <- generateMachines_Mix(train_input = train_input,
train_response = train_response,
scale_input = scale_input,
scale_machine = scale_machine,
machines = machines,
splits = splits,
basicMachineParam = setBasicMachineParam)
}else{
mach2 <- list(fitted_remain = train_predictions,
models = NULL,
id2 = rep(TRUE, nrow(train_input)),
train_data = list(train_input = train_input,
train_response = train_response,
predict_remain_org = train_predictions,
min_machine = NULL,
max_machine = NULL,
min_input = NULL,
max_input = NULL))
if(scale_machine){
min_ <- map_dbl(train_predictions, .f = min)
max_ <- map_dbl(train_predictions, .f = max)
mach2$train_data$min_machine = min_
mach2$train_data$max_amchine = max_
mach2$fitted_remain <- scale(train_predictions,
center = min_,
scale = max_ - min_)
}
if(scale_input){
min_ <- map_dbl(train_input, .f = min)
max_ <- map_dbl(train_input, .f = max)
mach2$train_data$min_input = min_
mach2$train_data$max_input = max_
mach2$train_data$train_input <- scale(train_input,
center = min_,
scale = max_ - min_)
}
}
# distance matrix to compute loss function
if_euclid <- FALSE
id_euclid <- NULL
n_ker <- length(kernels)
dist_all <- list()
id_shuf <- NULL
for (k_ in 1:n_ker){
ker <- kernel_real[k_]
if(ker == "naive"){
dist_all[["naive"]] <- dist_matrix_Mix(basicMachines = mach2,
n_cv = n_cv,
kernel = "naive",
id_shuffle = id_shuf)
} else{
if(ker == "triangular"){
dist_all[["triangular"]] <- dist_matrix_Mix(basicMachines = mach2,
n_cv = n_cv,
kernel = "triangular",
id_shuffle = id_shuf)
} else{
if(if_euclid){
dist_all[[ker]] <- dist_all[[id_euclid]]
} else{
dist_all[[ker]] <- dist_matrix_Mix(basicMachines = mach2,
n_cv = n_cv,
kernel = ker,
id_shuffle = id_shuf)
id_euclid <- ker
if_euclid <- TRUE
}
}
}
id_shuf <- dist_all[[1]]$id_shuffle
}
# Kernel functions
# ================
# Gaussian
gaussian_kernel <- function(.ep,
.dist_matrix,
.train_response2,
.inv_sigma = inv_sigma,
.alpha = alp){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(exp(- (x[1]*D1+x[2]*D2)^(.alpha/2)*.inv_sigma^.alpha))
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Epanechnikov
epanechnikov_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D))
tem0[tem0 < 0] = 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Biweight
biweight_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D2))
tem0[tem0 < 0] = 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0^2/colSums(tem0^2)
y_hat[is.na(y_hat)] <- 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Triweight
triweight_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D2))
tem0[tem0 < 0] = 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0^3/colSums(tem0^3)
y_hat[is.na(y_hat)] <- 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Triangular
triangular_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D2))
tem0[tem0 < 0] <- 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Naive
naive_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- (as.matrix((x[1]*D1+x[2]*D2)) < 1)
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# list of kernel functions
list_funs <- list(gaussian = gaussian_kernel,
epanechnikov = epanechnikov_kernel,
biweight = biweight_kernel,
triweight = triweight_kernel,
triangular = triangular_kernel,
naive = naive_kernel)
# error for all kernel functions
error_func <- kernel_real %>%
map(.f = ~ (\(x) list_funs[[.x]](.ep = x,
.dist_matrix = dist_all[[.x]],
.train_response2 = train_response[mach2$id2])/n_cv))
names(error_func) <- kernels
# list of prameter setup
list_param <- list(grad = setGradParam,
GD = setGradParam,
grid = setGridParam)
# list of optimizer
list_optimizer <- list(grad = gradOptimizer_Mix,
GD = gradOptimizer_Mix,
grid = gridOptimizer_Mix)
optMethods <- optimizeMethod
if(length(kernels) != length(optMethods)){
warning("* kernels and optimization methods differ in sides! Grid search will be used!")
optMethods = rep("grid", length(kernels))
}
# Optimization
parameters <- map2(.x = kernels,
.y = optMethods,
.f = ~ list_optimizer[[.y]](obj_fun = error_func[[.x]],
setParameter = list_param[[.y]]))
names(parameters) <- paste0(kernel_real, "_", optMethods)
return(list(opt_parameters = parameters,
add_parameters = list(inv_sigma = inv_sigma,
alp = alp,
opt_methods = optimizeMethod),
basic_machines = mach2))
}
Example.5: We approximate the smoothing parameter of
Bostondata.
df <- MASS::Boston
train <- logical(nrow(df))
train[sample(length(train), floor(0.75*nrow(df)))] <- TRUE
param <- fit_parameter_Mix(train_input = df[train, 1:13],
train_response = df[train, 14],
machines = c("knn", "rf", "xgb"),
splits = .50,
kernels = c("gaussian", "biweight", "triangular"),
optimizeMethod = c("grad", "grid", "grid"),
setBasicMachineParam = setBasicParameter_Mix(k = 2:6,
ntree = 1:5*100,
nrounds_xgb = 1:5*100),
setGradParam = setGradParameter_Mix(rate = "linear",
print_step = FALSE,
coef_auto = c(0.3, 0.3)),
setGridParam = setGridParameter_Mix(min_alpha = 0.0001,
max_alpha = 3,
min_beta = 0.0001,
max_beta = 3))
* Building basic machines ...
~ Progress: ... 33% ... 67% ... 100%
~ Observed parameter: (alpha, beta) = ( 2.911775e-23 , 31.59902 ) in 77 itertaions.
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1e-04, 2.79311)
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1e-04, 0.5173241)
param$opt_parameters %>%
map_dfc(.f = ~ .x$opt_param) %>%
print
The smoothing parameter obtained from the previous section can be used to make the final predictions.
Several types of kernel functions used for the aggregation are defined in this section.
Argument:
theta : a 2D-vector of the parameter \((\alpha, \beta)\) used..y2 : the vector of response variable of the second
part \(\mathcal{D}_{\ell}\) of the
training data, which is used for the aggregation..distance : the distance matrix object obtained from
dist_matrix function..kern : the string specifying the kernel function. By
default, .kern = "gaussian"..inv_sig, .alp : the parameters of exponential kernel
function..meth : the string of optimization methods to be linked
to the name of the kernel functions if any perticular kernels are used
more than once (maybe with different optimiztaion method such as
“gaussian” kernel, can be used with both “grad” and “grid” optimization
methods).Value:
This function returns the predictions of the aggregation method evaluated with the given parameter.
kernel_pred_Mix <- function(theta,
.y2,
.dist1,
.dist2,
.kern = "gaussian",
.inv_sig = sqrt(.5),
.alp = 2,
.meth = NA){
distD <- as.matrix(theta[1]*.dist1+theta[2]*.dist1)
# Kernel functions
# ================
gaussian_kernel <- function(D,
.inv_sigma = .inv_sig,
.alpha = .alp){
tem0 <- exp(- D^(.alpha/2)*.inv_sig^.alpha)
y_hat <- .y2 %*% tem0/colSums(tem0)
return(t(y_hat))
}
# Epanechnikov
epanechnikov_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] = 0
y_hat <- .y2 %*% tem0/colSums(tem0)
return(t(y_hat))
}
# Biweight
biweight_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] = 0
y_hat <- .y2 %*% tem0^2/colSums(tem0^2)
y_hat[is.na(y_hat)] <- 0
return(t(y_hat))
}
# Triweight
triweight_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] = 0
y_hat <- .y2 %*% tem0^3/colSums(tem0^3)
y_hat[is.na(y_hat)] <- 0
return(t(y_hat))
}
# Triangular
triangular_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] <- 0
y_hat <- .y2 %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(t(y_hat))
}
# Naive
naive_kernel <- function(D){
tem0 <- (D < 1)
y_hat <- .y2 %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(t(y_hat))
}
# Prediction
kernel_list <- list(gaussian = gaussian_kernel,
epanechnikov = epanechnikov_kernel,
biweight = biweight_kernel,
triweight = triweight_kernel,
triangular = triangular_kernel,
naive = naive_kernel)
res <- tibble(as.vector(kernel_list[[.kern]](D = distD)))
names(res) <- ifelse(is.na(.meth),
.kern,
paste0(.kern, '_', .meth))
return(res)
}
predict_MixArgument:
fitted_models : the object obtained from
fit_parameter_Mix function.new_data : the testing data to be predicted.test_response : the testing response variable, it is
optional. If it is given, the mean square error on the testing data is
also computed. By default, test_response = NULL.Value:
This function returns a list of the following objects:
fitted_aggregate : the predictions by the aggregation
methods.fitted_machine : the predictions given by all the basic
machines.mse : the mean square error computed only if the
test_reponse argument is note NULL.# Prediction
predict_Mix <- function(fitted_models,
new_data,
test_response = NULL){
opt_param <- fitted_models$opt_parameters
add_param <- fitted_models$add_parameters
basic_mach <- fitted_models$basic_machines
kern0 <- names(opt_param)
kernel0 <- stringr::str_split(kern0, "_") %>%
imap_dfc(.f = ~ tibble("{.y}" := .x))
kerns <- kernel0[1,] %>%
as.character
opt_meths <- kernel0[2,] %>%
as.character
new_data_ <- new_data
mat_input <- as.matrix(basic_mach$train_data$train_input)
if(!is.null(basic_mach$train_data$min_input)){
new_data_ <- scale(new_data,
center = basic_mach$train_data$min_input,
scale = basic_mach$train_data$max_input - basic_mach$train_data$min_input)
}
if(is.matrix(new_data_)){
mat_test <- new_data_
df_test <- as_tibble(new_data_)
} else {
mat_test <- as.matrix(new_data_)
df_test <- new_data_
}
if(is.null(basic_mach$models)){
pred_test_all <- new_data_
pred_test0 <- new_data
}else{
# Prediction test by basic machines
built_models <- basic_mach$models
pred_test <- function(meth){
if(meth == "knn"){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := FNN::knn.reg(train = mat_input[!basic_mach$id2,],
test = mat_test,
y = basic_mach$train_data$train_response[!basic_mach$id2],
k = built_models[[meth]][[k]])$pred)))
}
if(meth %in% c("tree", "rf")){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := as.vector(predict(built_models[[meth]][[k]], df_test)))))
}
if(meth %in% c("lasso", "ridge")){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := as.vector(predict.glmnet(built_models[[meth]][[k]], mat_test)))))
}
if(meth == "xgb"){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := as.vector(predict(built_models[[meth]][[k]], mat_test)))))
}
colnames(pre) <- names(built_models[[meth]])
return(pre)
}
pred_test_all <- names(built_models) %>%
map_dfc(.f = pred_test)
pred_test0 <- pred_test_all
if(!is.null(basic_mach$train_data$min_machine)){
pred_test_all <- scale(pred_test0,
center = basic_mach$train_data$min_machine,
scale = basic_mach$train_data$max_machine - basic_mach$train_data$min_machine)
}
}
# Prediction train2
pred_train_all <- basic_mach$fitted_remain
colnames(pred_test_all) <- colnames(pred_train_all)
d_train <- dim(pred_train_all)
d_test <- dim(pred_test_all)
d_train_input <- dim(mat_input[basic_mach$id2,])
d_test_input <- dim(new_data_)
pred_test_mat <- as.matrix(pred_test_all)
pred_train_mat <- as.matrix(pred_train_all)
# Distance matrix
dist_mat <- function(kernel = "gausian"){
res_1 <- res_2 <- NULL
if(!(kernel %in% c("naive", "triangular"))){
res_1 <- 1:d_test_input[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums((mat_input[basic_mach$id2,] - matrix(rep(new_data_[id,],
d_train_input[1]),
ncol = d_train_input[2],
byrow = TRUE))^2)))))
res_2 <- 1:d_test[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums((pred_train_mat - matrix(rep(pred_test_mat[id,],
d_train[1]),
ncol = d_train[2],
byrow = TRUE))^2)))))
}
if(kernel == "triangular"){
res_1 <- 1:d_test_input[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums(abs(mat_input[basic_mach$id2,] - matrix(rep(new_data_[id,],
d_train_input[1]),
ncol = d_train_input[2],
byrow = TRUE)))))))
res_2 <- 1:d_test[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums(abs(pred_train_mat - matrix(rep(pred_test_mat[id,],
d_train[1]),
ncol = d_train[2],
byrow = TRUE)))))))
}
if(kernel == "naive"){
res_1 <- 1:d_test_input[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(apply(abs(mat_input[basic_mach$id2,] - matrix(rep(new_data_[id,],
d_train_input[1]),
ncol = d_train_input[2],
byrow = TRUE)), 1, max)))))
res_2 <- 1:d_test[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(apply(abs(pred_train_mat - matrix(rep(pred_test_mat[id,], d_train[1]),
ncol = d_train[2],
byrow = TRUE)), 1, max)))))
}
return(list(dist_input = res_1,
dist_machine = res_2))
}
dists <- 1:length(kerns) %>%
map(.f = ~ dist_mat(kerns[.x]))
tab_nam <- table(kerns)
nam <- names(tab_nam[tab_nam > 1])
vec <- rep(NA, length(kerns))
for(id in nam){
id_ <- kerns == id
if(!is.null(id_)){
vec[id_] = add_param$opt_methods[id_]
}
}
prediction <- 1:length(kerns) %>%
map_dfc(.f = ~ kernel_pred_Mix(theta = opt_param[[kern0[.x]]]$opt_param,
.y2 = basic_mach$train_data$train_response[basic_mach$id2],
.dist1 = dists[[.x]]$dist_input,
.dist2 = dists[[.x]]$dist_machine,
.kern = kerns[.x],
.inv_sig = add_param$inv_sigma,
.alp = add_param$alp,
.meth = vec[.x]))
if(is.null(test_response)){
return(list(fitted_aggregate = prediction,
fitted_machine = pred_test0))
} else{
error <- cbind(pred_test0, prediction) %>%
dplyr::mutate(y_test = test_response) %>%
dplyr::summarise_all(.funs = ~ (. - y_test)) %>%
dplyr::select(-y_test) %>%
dplyr::summarise_all(.funs = ~ mean(.^2))
return(list(fitted_aggregate = prediction,
fitted_machine = pred_test0,
mse = error))
}
}
Example.6 Aggregation on
Bostondataset.
pred <- predict_Mix(param,
new_data = df[!train, -14],
test_response = df[!train, 14])
sqrt(pred$mse)
MixCobraReg (direct aggregation)This function puts together all the functions above and provides the desire result of MixCobra aggregation method.
Argument: all of its arguments are already described in the previous parts.
Value:
This function return a list of the following objects:
fitted_aggregate : the predicted values of the
aggregation method.fitted_machine : the predicted values of all the basic
machines built on \(\mathcal{D}_{k}\).pred_train2 : the prediction of \(\mathcal{D}_{\ell}\), given by all the
basic machiens.opt_parameter : the observed optimal parameters.mse : the meas square error evaluated on the testing
data if test_response is given.kernels : a vector of kernel function used.ind_D2 : a logical vector indicating the part of
training data corresponding to \(\mathcal{D}_{\ell}\)
(TRUE).MixCobraReg <- function(train_input,
train_response,
test_input,
train_predictions = NULL,
test_response = NULL,
machines = NULL,
scale_input = TRUE,
scale_machine = TRUE,
splits = 0.5,
n_cv = 5,
inv_sigma = sqrt(.5),
alp = 2,
kernels = "gaussian",
optimizeMethod = "grad",
setBasicMachineParam = setBasicParameter_Mix(),
setGradParam = setGradParameter_Mix(),
setGridParam = setGridParameter_Mix()){
# build machines + tune parameter
fit_mod <- fit_parameter_Mix(train_input = train_input,
train_response = train_response,
train_predictions = train_predictions,
machines = machines,
scale_input = scale_input,
scale_machine = scale_machine,
splits = splits,
n_cv = n_cv,
inv_sigma = inv_sigma,
alp = alp,
kernels = kernels,
optimizeMethod = optimizeMethod,
setBasicMachineParam = setBasicMachineParam,
setGradParam = setGradParam,
setGridParam = setGridParam)
# prediction
pred <- predict_Mix(fitted_models = fit_mod,
new_data = test_input,
test_response = test_response)
return(list(fitted_aggregate = pred$j,
fitted_machine = pred$fitted_machine,
pred_train2 = fit_mod$basic_machines$fitted_remain,
opt_parameter = fit_mod$opt_parameters,
mse = pred$mse,
kernels = kernels,
ind_D2 = fit_mod$basic_machines$id2))
}
Example.7 A complete aggregation is implemented on Abalone dataset. Three types of basic machines, and three different kernel functions are used.
pacman::p_load(readr)
colname <- c("Type", "LongestShell", "Diameter", "Height", "WholeWeight", "ShuckedWeight", "VisceraWeight", "ShellWeight", "Rings")
df <- readr::read_delim("https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data", col_names = colname, delim = ",", show_col_types = FALSE)
train <- logical(nrow(df))
train[sample(length(train), floor(0.75*nrow(df)))] <- TRUE
agg <- MixCobraReg(train_input = df[train, 2:8],
train_response = df$Rings[train],
test_input = df[!train,2:8],
test_response = df$Rings[!train],
n_cv = 3,
machines = c("knn", "rf", "xgb"),
splits = .5,
kernels = c("gaussian",
"naive",
"triangular"),
optimizeMethod = c("grad",
"grid",
"grid"),
setBasicMachineParam = setBasicParameter_Mix(k = c(2,5,7,10),
ntree = 2:5*100,
nrounds_xgb = c(1,3,5)*100),
setGradParam = setGradParameter_Mix(rate = "linear",
coef_auto = c(0.5, 0.5),
print_step = TRUE,
print_result = TRUE,
figure = TRUE),
setGridParam = setGridParameter_Mix(parameters = list(alpha = seq(0.0001, 3, length.out = 20),
beta = seq(0.0001, 5, length.out = 20))))
* Building basic machines ...
~ Progress: ... 33% ... 67% ... 100%
* Gradient descent algorithm ...
Step | alpha ; beta | Gradient (alpha ; beta) | Threshold
--------------------------------------------------------------------------------
0 | 10.00000 ; 50.00000 | 1.187664 ; -1.1801e-01 | 1e-10
--------------------------------------------------------------------------------
0 | 10.0000 ; 50.0000 | 1.18766 ; -1.1801e-01 | 11.75103
1 | 9.5000 ; 50.5000 | 1.19509 ; -1.0347e-01 | 0.01666667
2 | 8.4937 ; 51.3768 | 1.20438 ; -8.4406e-02 | 0.03138422
3 | 6.9726 ; 52.4497 | 1.19873 ; -7.8449e-02 | 0.04332695
4 | 4.9540 ; 53.7793 | 1.14367 ; -9.758e-02 | 0.05634606
5 | 2.5466 ; 55.8465 | 0.97947 ; -1.4157e-01 | 0.07618612
6 | 0.07249 ; 59.4455 | 0.49258 ; -1.9466e-01 | 0.2081879
7 | 0.03624 ; 65.2190 | 1.11185 ; 0.058122 | 0.5399887
8 | 0.01812 ; 63.2488 | 0.89225 ; -2.4679e-02 | 0.8720525
9 | 0.009061 ; 63.7194 | 0.93845 ; -5.7262e-03 | 0.3024014
10 | 0.004531 ; 63.7800 | 0.94255 ; -3.6737e-03 | 0.06515147
11 | 0.002265 ; 63.8228 | 0.94595 ; -2.1278e-03 | 0.006156164
12 | 0.001133 ; 63.8499 | 0.94826 ; -1.1203e-03 | 0.004945072
13 | 0.0005663 ; 63.8653 | 0.94962 ; -5.3751e-04 | 0.003316515
14 | 0.0002832 ; 63.8733 | 0.95032 ; -2.3565e-04 | 0.001942763
15 | 0.0001416 ; 63.8770 | 0.95065 ; -9.4773e-05 | 0.001009418
16 | 7.079e-05 ; 63.8786 | 0.95079 ; -3.5408e-05 | 0.0004686813
17 | 3.54e-05 ; 63.8793 | 0.95084 ; -1.2466e-05 | 0.0001944641
18 | 1.77e-05 ; 63.8795 | 0.95085 ; -4.5809e-06 | 7.254384e-05
19 | 8.849e-06 ; 63.8796 | 0.95086 ; -1.6521e-06 | 2.391844e-05
20 | 4.424e-06 ; 63.8796 | 0.95086 ; -7.5097e-07 | 8.14804e-06
21 | 2.212e-06 ; 63.8797 | 0.95086 ; -7.5097e-08 | 2.252914e-06
22 | 1.106e-06 ; 63.8797 | 0.95086 ; -1.5019e-07 | 1.389297e-06
23 | 5.531e-07 ; 63.8797 | 0.95086 ; -7.5097e-08 | 5.2568e-07
24 | 2.765e-07 ; 63.8797 | 0.95086 ; 0e+00 | 1.126457e-07
25 | 1.383e-07 ; 63.8797 | 0.95086 ; -7.5097e-08 | 2.252914e-07
26 | 6.913e-08 ; 63.8797 | 0.95086 ; -2.2529e-07 | 2.252914e-07
27 | 3.457e-08 ; 63.8797 | 0.95086 ; -3.7549e-08 | 2.6284e-07
28 | 1.728e-08 ; 63.8797 | 0.95086 ; -1.1265e-07 | 1.877429e-07
29 | 8.641e-09 ; 63.8797 | 0.95086 ; -7.5097e-08 | 2.6284e-07
30 | 4.321e-09 ; 63.8797 | 0.95086 ; -3.7549e-08 | 1.126457e-07
31 | 2.16e-09 ; 63.8797 | 0.95086 ; 0e+00 | 7.509715e-08
32 | 1.08e-09 ; 63.8797 | 0.95086 ; 0e+00 | 3.754857e-08
33 | 5.401e-10 ; 63.8797 | 0.95086 ; 0e+00 | 3.754857e-08
34 | 2.7e-10 ; 63.8797 | 0.95086 ; 1.5019e-07 | 1.501943e-07
35 | 1.35e-10 ; 63.8797 | 0.95086 ; 0e+00 | 3.003886e-07
36 | 6.751e-11 ; 63.8797 | 0.95086 ; 3.7549e-08 | 2.6284e-07
37 | 3.376e-11 ; 63.8797 | 0.95086 ; -7.5097e-08 | 7.509715e-08
38 | 1.688e-11 ; 63.8797 | 0.95086 ; 3.7549e-08 | 1.877429e-07
39 | 8.439e-12 ; 63.8797 | 0.95086 ; 0e+00 | 1.877429e-07
40 | 4.219e-12 ; 63.8797 | 0.95086 ; 0e+00 | 1.501943e-07
41 | 2.11e-12 ; 63.8797 | 0.95086 ; -7.5097e-08 | 7.509715e-08
42 | 1.055e-12 ; 63.8797 | 0.95086 ; -3.7549e-08 | 1.501943e-07
43 | 5.274e-13 ; 63.8797 | 0.95086 ; 0e+00 | 1.877429e-07
44 | 2.637e-13 ; 63.8797 | 0.95086 ; 0e+00 | 3.754857e-08
45 | 1.319e-13 ; 63.8797 | 0.95086 ; 0e+00 | 3.754857e-08
46 | 6.593e-14 ; 63.8797 | 0.95086 ; 0e+00 | 7.509715e-08
--------------------------------------------------------------------------------
Stopped| 3.296e-14 ; 63.8797 | 0.95086 ; 0e+00 | 5.160414e-16
~ Observed parameter: (alpha, beta) = ( 3.296455e-14 , 63.87967 ) in 47 itertaions.
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1.105326, 4.473695)
Warning in eval(f) : restarting interrupted promise evaluation
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (0.1579895, 0.7895579)
sqrt(agg$mse)